Hybrid Permanent/Reversible Dynamic Range Control System

ABSTRACT

A technique for controlling audio dynamic range in a manner that can be permanent, reversible, or anywhere in between, and can accomplish this goal in the baseband PCM or encoded domains.

This application claims priority to U.S. Provisional Application No.61/175,853, filed May 6, 2009, the disclosure of which is incorporatedby reference herein in its entirety.

BACKGROUND OF THE INVENTION

This patent application describes a novel technique for controllingaudio dynamic range in a manner that can be permanent, reversible, oranywhere in between, and can accomplish this goal in the baseband PCM orencoded domains.

Modern distribution of audio signals to consumers necessarily involvesthe use of data rate reduction or audio compression techniques to lowerthe required amount of data required to deliver these audio signal toconsumers while causing minimal impact to the original audio quality.Systems including AC-3, DTS, MPEG-2 AAC and HE AAC are examples ofcommon audio data reduction techniques. For the purposes of thisinvention, only the AC-3 system will be used as an example, but theinvention is applicable to any coding system and is applicable totelevision, radio, internet, or any other means of program distributionor transmission.

Audio metadata, also known as data about the audio data, is alsoincluded with these systems to describe the encoded audio. This data ismultiplexed in with the compressed audio data and delivered to consumerswhere it is extracted and applied to the audio in a user-adjustablemanner.

One such metadata parameter is called dialnorm and is intended tocontrol average loudness of a program

Other parameters such as dynrng and compr, collectively referred to asDRC, are intended to control program dynamic range.

Programs are in many cases produced with loudness and dynamic range thatvaries to convey emotion or the level of excitement in a given scene,while interstitial or commercial material is very often produced toconvey a message and may be at a constant loudness.

In some cases these program and commercial elements can differsubstantially in average loudness and dynamic range and many consumerenvironments are not conducive to large changes in loudness or dynamicrange.

Artistic intent while perhaps appropriate in more carefully controlledsituations can cause audibility problems and result in viewer orlistener complaints. This is commonly referred to as the “loudcommercial problem” but can be caused as much by excessive dynamic rangeas mismatched loudness.

An additional complicating factor is the desire and sometimes the legalrequirement for maintaining the integrity of the original audio as someviewers and even regulatory bodies may require that the program audionot be changed in any way. Because of this processes applied to theaudio should be reversible.

Prior art has described two general types of systems capable ofcontrolling audio dynamic range: AGC-type systems that detect and adjustthe level of applied audio signals in a permanent and non-reversiblemanner, effectively controlling loudness shifts and dynamic range to adegree acceptable to most consumers. An example of this type of systemis a standard transmission processor commonly found in analog broadcastfacilities and details of which are common knowledge to those skilled inthe art.

Systems that use side-chain data or metadata to allow the original audioto be carried to consumers and be modified by the metadata to match therequirements of individual consumers allowing a reasonable degree ofcontrol to be applied to the reproduced audio signal, or allowing theaudio signal to be reproduced in its original form with no controlapplied. An example of the latter system can be found in the AC-3system.

The current invention offers a hybrid of the two approaches, allowing acontinuously variable choice of which method is being applied frompermanent to reversible.

SUMMARY OF THE INVENTION

The current invention described in this application describes a methodwhereby the dynamic range of an input audio signal can be modified in apermanent or reversible manner, or an infinitely adjustable hybridbetween permanent and reversible.

In one embodiment, the invention discloses a method for controlling thedynamic range of an audio signal in a hybrid permanent/reversiblemanner, the method comprising:

applying original audio to a detector and generating a control signal;

applying the same original audio to a first gain control element;

producing a permanently controlled output signal by varying this firstgain control element with the control signal to raise or lower the levelof the signal so that the loudest and quietest parts are brought closerto a target level;

applying the same control signal to a block formatter to match thecapabilities of an audio encoder;

creating an inverse of this block formatted gain control signal;

passing this inverse block formatted signal through a control element toallow all, some, or none of the inverse block formatted signal to pass;

producing “remainder audio” by applying the permanently controlledoutput signal to a second gain control element to “un-apply” the actionsof the original gain control within the boundaries of the blockformatted signal;

applying this remainder audio to an audio encoder; delaying thenon-inverse block based control signal; and

using this delayed version of the non-inverse block based control signalas part of the encoding process representing one or more metadataelements;

when delivered to a corresponding decoder along with the remainderaudio, reversible gain control can then be applied, somewhat applied, ornot applied at all.

Other prior work has described methods where the dynamic range of anapplied audio signal can be directly and permanently adjusted bydetecting the level of the audio signal and generating a control signalthat is used to adjust the gain of the audio higher if it is lower thansome reference or to adjust the gain of the audio signal lower if it ishigher than some reference, a process commonly known as Automatic GainControl (AGC).

Still other prior work has described methods where the dynamic range ofan applied audio signal can be indirectly and reversibly adjusted bydetecting the level of the audio signal and generating a control signalthat is passed as metadata along with the original audio to somereceiving or decoding device where the control signal can be applieddirectly to adjust the gain of the audio higher if it is lower than somereference or to adjust the gain of the audio signal lower if it ishigher than some reference. This control signal can also be scaledbefore application to produce less or more control of the audio signal,or the control signal can be ignored thus resulting in no change to theoriginal audio. One use of this process is described in ATSC StandardA/52: Digital Audio Compression (AC-3).

The current invention is fundamentally different from other prior workin that it is a hybrid between permanent change to applied audio andchange that is reversible and allows selection of any combination of thetwo approaches thus providing a minimum and maximum dynamic range on acontinuously adjustable basis.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of the specification, illustrate various example systems, methods,and so on, that illustrate various example embodiments of aspects of theinvention. It will be appreciated that the illustrated elementboundaries (e.g., boxes, groups of boxes, or other shapes) in thefigures represent one example of the boundaries. One of ordinary skillin the art will appreciate that one element may be designed as multipleelements or that multiple elements may be designed as one element. Anelement shown as an internal component of another element may beimplemented as an external component and vice versa. Furthermore,elements may not be drawn to scale.

FIG. 1 illustrates an example traditional AGC System.

FIG. 2 illustrates an example metadata-based AGC system.

FIG. 3 illustrates a block diagram of an example system for controllingdynamic range in hybrid permanent/reversible manner.

FIG. 4 illustrates a block diagram of an example system for controllingdynamic range in hybrid permanent/reversible manner.

FIG. 5 illustrates a block diagram of an example system for controllingdynamic range in hybrid permanent/reversible manner.

FIG. 6 illustrates a block diagram of an example system for controllingdynamic range in hybrid permanent/reversible manner.

FIG. 7 illustrates a block diagram of an example system for controllingdynamic range in hybrid permanent/reversible manner.

FIG. 8 illustrates a block diagram of an example multiband AGC system.

DETAILED DESCRIPTION

FIG. 1 depicts a traditional AGC system where input audio (1) is passedto a detector (2) and to a variable gain element (3), and the detectorcreates a control signal (4) which is fed to the control input of thevariable gain element to lower the level of the input signal if thelevel is higher than some reference, or to raise the level of the inputsignal if the level is lower than some reference therefore producing anoutput (5) where the lowest and highest levels are closer to each other,thus lowering the dynamic range. It should be noted that this type ofAGC is commonly known as a feed-forward AGC, and that an alternateversion where the control signal is detected after the gain element andfed back to the gain control element is commonly known as a feed-backAGC. Either of these methods should be seen as systems that permanentlychange the audio.

FIG. 2 depicts a simplified metadata-based AGC system where input audio(1) is detected (2) and the control data (3) multiplexed (4) with thedata. This composite data stream (5) is sent to a demultiplexer (6)which outputs the audio data and control data. The multiplexer anddemultiplexer are generally known to be parts of systems such as digitaltelevision encoders and decoders. The control data can then beselectively used to vary a gain element (7) to adjust the level of theaudio signal and control the dynamic range. This control signal can alsobe scaled to apply more or less control or can be ignored completely (8)allowing the original audio to be reproduced unmodified and this methodcan be considered one that is reversible.

FIG. 3 depicts one embodiment of the current invention. Input audio (1),which can be a single channel, stereo, or as shown 5.1 channels isapplied to an AGC means (2). The AGC means operates by detecting theinput audio and generating a control signal that is used to vary a gainelement to lower the level of the input signal if the level is higherthan some reference, or to raise the level of the input signal if thelevel is lower than some reference therefore moving the lowest andhighest levels closer to each other and thus outputting audio (3) withan adjusted dynamic range to a variable gain element (4). The controlsignal developed for the AGC process is also output (5) and applied to ablock formatter (6) which will create gain control values on a blockbasis, matching the capabilities of the final encoder. These gaincontrol blocks are then applied to a means that creates an inverse ofthese gain control blocks (7) and applies them to the control input ofthe gain element after passing through a control element (8) to allowall, some, or none of the block gain control signal to pass. It shouldbe noted that the block formatting process can be applied as shown, aspart of the variable gain element, or a combination of both. Thisinverse application of the block gain control signal by the gain elementto audio that had already been changed by the non-inverse version of theoriginal control signal results in the “un-application” of the controlsignal within the accuracy of the block formatting process. Thisso-called “remainder” audio (9) has the useful property of being able tobe returned to its processed state or back to its unprocessed statewithin the boundaries of the block processing by applying all, some, ornone of the block-based control signal. This audio is applied to anencoder (10), such as one described in ATSC A/52 and the block-basedgain control signal (11) is first delayed (12) and then is multiplexed(13) into the encoded bitstream as gain control words such as compr,dynrng and/or dialnorm as described in ATSC A/52.

FIG. 4 depicts another embodiment of the current invention. Input audio(1), which can be a single channel, stereo, or as shown 5.1 channels isapplied to an AGC means which operates by detecting the input audio andgenerating a control signal that is used to vary a gain element to lowerthe level of the input signal if the level is higher than somereference, or to raise the level of the input signal if the level islower than some reference therefore moving the lowest and highest levelscloser to each other and thus outputting audio (3) with an adjusteddynamic range to a variable gain element (4). The control signaldeveloped for the AGC process is also output (5) and applied to a blockformatter (6) which will create gain control values on a block basis,matching the capabilities of the final encoder. These gain controlblocks are then applied to a means that creates an inverse of these gaincontrol blocks (7) and applies them to the control input of the gainelement after passing through a control element (8) to allow all, some,or none of the block gain control signal to pass. It should be notedthat the block formatting process can be applied as shown, as part ofthe variable gain element, or a combination of both. This inverseapplication of the block gain control signal by the gain element toaudio that had already been changed by the non-inverse version of theoriginal control signal results in the “un-application” of the controlsignal within the accuracy of the block formatting process. Thisso-called “remainder” audio (9) has the useful property of being able tobe returned to its processed state or back to its unprocessed statewithin the boundaries of the block processing by applying all, some, ornone of the block-based control signal. This audio is applied to anencoder (10), such as one described in ATSC A/52 and the block-basedgain control signal (11) is first delayed (12) then input to the encoderas a metadata signal (13).

FIG. 5 depicts yet another embodiment of the current invention. Inputaudio (1) is in the AC-3 encoded form and is first applied to an AC-3decoder (2) to produce decoded PCM audio signals (3) which can be mono,stereo or 5.1 channels as shown. These audio signals are then applied toan AGC means (4) which operates by detecting the input audio andgenerating a control signal that is used to vary a gain element to lowerthe level of the input signal if the level is higher than somereference, or to raise the level of the input signal if the level islower than some reference therefore moving the lowest and highest levelscloser to each other and thus outputting audio (5) with an adjusteddynamic range to a variable gain element (6). The control signaldeveloped for the AGC process is also output (7) and applied to a blockformatter (8) which will create gain control values on a block basis,matching the capabilities of the final encoder. These gain controlblocks are then applied to a means that creates an inverse of these gaincontrol blocks (9) and applies them to the control input of the gainelement after passing through a control element (10) to allow all, some,or none of the block gain control signal to pass. It should be notedthat the block formatting process can be applied as shown, as part ofthe variable gain element, or a combination of both. This inverseapplication of the block gain control signal by the gain element toaudio that had already been changed by the non-inverse version of theoriginal control signal results in the “un-application” of the controlsignal within the accuracy of the block formatting process. Thisso-called “remainder” audio (11) has the useful property of being ableto be returned to its processed state or back to its unprocessed statewithin the boundaries of the block processing by applying all, some, ornone of the block-based control signal. This audio is applied to anencoder (12), such as one described in ATSC A/52 and the block-basedgain control signal (13) is first delayed (14) and then is multiplexed(15) into the encoded bitstream as gain control words such as compr,dynrng and/or dialnorm as described in ATSC A/52.

FIG. 6 depicts yet another embodiment of the current invention. Inputaudio (1) is in the AC-3 encoded form and is applied both to a delaymeans (2) and to an AC-3 decoder (3) to produce decoded PCM audiosignals (4) which can be mono, stereo or 5.1 channels as shown. Theseaudio signals are then applied to an AGC means (5) which operates bydetecting the input audio and generating a control signal that is usedto vary a gain element to lower the level of the input signal if thelevel is higher than some reference, or to raise the level of the inputsignal if the level is lower than some reference therefore moving thelowest and highest levels closer to each other and thus outputting audio(6) with an adjusted dynamic range to a variable gain element (7). Thecontrol signal developed for the AGC process is also output (8) andapplied to a block formatter (9) which will create gain control valueson a block basis, matching the capabilities of the final encoder. Thesegain control blocks are then applied to a means that creates an inverseof these gain control blocks (10) and applies them to the control inputof the gain element after passing through a control element (11) toallow all, some, or none of the block gain control signal to pass. Itshould be noted that the block formatting process can be applied asshown, as part of the variable gain element, or a combination of both.This inverse application of the block gain control signal by the gainelement to audio that had already been changed by the non-inverseversion of the original control signal results in the “un-application”of the control signal within the accuracy of the block formattingprocess. This so-called “remainder” audio (12) has the useful propertyof being able to be returned to its processed state or back to itsunprocessed state within the boundaries of the block processing byapplying all, some, or none of the block-based control signal. Thisaudio is applied to an AC-3 encoder (13), such as one described in ATSCA/52 and the block-based gain control signal (14) is first delayed (15)and then is sent with the delayed original AC-3 input signal (16) andthe newly created AC-3 signal (17) to the multiplexer (18). It is thenpossible to compare and modify the original encoded audio data blocks tomore closely match the newly encoded data blocks to allow for a moreaccurate representation of the so-called remainder audio, essentiallyallowing audio modification without fully decoding and re-encoding.

FIG. 7 depicts still yet another embodiment of the current invention.Input audio (1) is in the AC-3 encoded form and is applied both to adelay means (2) and to an AC-3 decoder (3) to produce decoded PCM audiosignals (4) which can be mono, stereo or 5.1 channels as shown. Theseaudio signals are then applied to an AGC means (5) that detects theinput audio and generates a control signal (6) that is applied to ablock formatter (7) which will create gain control values on a blockbasis, matching the capabilities of the final encoder. This blockformatted control (8) signal is applied with the delayed original AC-3input signal (9) to the multiplexer (10) where existing compr, dynrngand/or dialnorm control words will be replaced. This method allows forinsertion of gain control information into a previously encodedbitstream without the need to decode and re-encode the signal.

FIG. 8 depicts a more sophisticated AGC means where the input audio (1)is first adjusted in average level by Input AGC (2), then is split intoa multiplicity of bands by crossovers (3), shown here as five bands butcan be any number of bands, and each band then has its own AGC (4)specifically optimized for the range of frequencies it is controlling.Each band of frequencies is then applied to its own limiter (5) and thenthe bands are summed (6) and applied to an overall peak limiter (7).Each of these sections (2), (4), (5), and (6), also outputs a controlsignal, all of which are summed into a final composite control signal(8). The functionality of this drawing can be inserted as the AGC meansshown on any of the other drawings in the description of this invention.

It should be noted that the invention described here can work alone orin tandem with additional audio processing, and can operate in thebaseband PCM or compressed domains such as AC-3, DTS, MPEG, and othersvia standard gain adjustments or metadata manipulation.

It should be noted that this process can operate in real-time, fasterthan real-time in a software or hardware or hybrid software/hardwareimplementation, or slower than real time in a software or hardware orhybrid software/hardware implementation.

It should be noted that unlike prior art, implementation of thisinvention allows for control of dynamic range in a reversible manner, ina permanent manner, or anywhere in between reversible and permanent. Inthe reversible manner, adjustments made to the audio are done viacontrol data sent alongside the original audio in the form of metadatawhich can be applied fully, in a scaled manner, or not at all but wherethe original audio is delivered separately and intact. In the permanentmanner, the audio is fully processed before encoding and control datasent alongside the original audio is fixed at a constant value such thatthere will be no difference between applying it fully or not applying itat all. In the hybrid case, part of the adjustment of the audio is donein a permanent manner, while the remaining part is done in a reversiblemanner allowing partial reversibility.

While example systems, methods, and so on, have been illustrated bydescribing examples, and while the examples have been described inconsiderable detail, it is not the intention of the applicants torestrict or in any way limit scope to such detail. It is, of course, notpossible to describe every conceivable combination of components ormethodologies for purposes of describing the systems, methods, and soon, described herein. Additional advantages and modifications willreadily appear to those skilled in the art. Therefore, the invention isnot limited to the specific details, the representative apparatus, andillustrative examples shown and described. Thus, this application isintended to embrace alterations, modifications, and variations that fallwithin the scope of the appended claims. Furthermore, the precedingdescription is not meant to limit the scope of the invention. Rather,the scope of the invention is to be determined by the appended claimsand their equivalents.

To the extent that the term “includes” or “including” is employed in thedetailed description or the claims, it is intended to be inclusive in amanner similar to the term “comprising” as that term is interpreted whenemployed as a transitional word in a claim. Furthermore, to the extentthat the term “or” is employed in the detailed description or claims(e.g., A or B) it is intended to mean “A or B or both”. When theapplicants intend to indicate “only A or B but not both” then the term“only A or B but not both” will be employed. Thus, use of the term “or”herein is the inclusive, and not the exclusive use. See, Bryan A.Garner, A Dictionary of Modern Legal Usage 624 (2 d. Ed. 1995).

1. A method for controlling the dynamic range of an audio signal in ahybrid permanent/reversible manner, the method comprising: applyingoriginal audio to a detector and generating a control signal; applyingthe same original audio to a first gain control element; producing apermanently controlled output signal by varying this first gain controlelement with the control signal to raise or lower the level of thesignal so that the loudest and quietest parts are brought closer to atarget level; applying the same control signal to a block formatter tomatch the capabilities of an audio encoder; creating an inverse of thisblock formatted gain control signal; passing this inverse blockformatted signal through a control element to allow all, some, or noneof the inverse block formatted signal to pass; producing “remainderaudio” by applying the permanently controlled output signal to a secondgain control element to “un-apply” the actions of the original gaincontrol within the boundaries of the block formatted signal; applyingthis remainder audio to an audio encoder; delaying the non-inverse blockbased control signal; and using this delayed version of the non-inverseblock based control signal as part of the encoding process representingone or more metadata elements.